Impact of audio segmentation and segment clustering on automated transcription accuracy of large spoken archives

نویسندگان

  • Bhuvana Ramabhadran
  • Jing Huang
  • Upendra V. Chaudhari
  • Giridharan Iyengar
  • Harriet J. Nock
چکیده

This paper addresses the influence of audio segmentation and segment clustering on automatic transcription accuracy for large spoken archives. The work forms part of the ongoing MALACH project, which is developing advanced techniques for supporting access to the world’s largest digital archive of video oral histories collected in many languages from over 52000 survivors and witnesses of the Holocaust. We present several audio-only and audio-visual segmentation schemes, including two novel schemes: the first is iterative and audio-only, the second uses audio-visual synchrony. Unlike most previous work, we evaluate these schemes in terms of their impact upon recognition accuracy. Results on English interviews show the automatic segmentation schemes give performance comparable to (exhorbitantly expensive and impractically lengthy) manual segmentation when using a single pass decoding strategy based on speaker-independent models. However, when using a multiple pass decoding strategy with adaptation, results are sensitive to both initial audio segmentation and the scheme for clustering segments prior to adaptation: the combination of our best automatic segmentation and clustering scheme has an error rate 8% worse (relative) to manual audio segmentation and clustering due to the occurrence of “speaker-impure” segments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segment Generation and Clustering in the HTK Broadcast News Transcription System

This paper describes the segmentation, gender detection and segment clustering scheme used in the 1997 HTK broadcast news evaluation system and presents results on both the unpartitioned 1996 development and the 1997 evaluation sets. The stages of our approach are presented, namely classification, segmentation and gender detection, gender relabelling, and clustering of speech segments. The eval...

متن کامل

Efficient Access to Lecture Audio Archives through Spoken Language Processing

The paper firstly addresses the current state of speech recognition using the “Corpus of Spontaneous Japanese (CSJ)”. It is shown that the large-scale corpus had strong impact in training acoustic and language models considering morphological and pronunciation variations which are characteristic to spontaneous Japanese. Unsupervised adaptation of these models and the speaking rate is also effec...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

Image Segmentation: Type–2 Fuzzy Possibilistic C-Mean Clustering Approach

Image segmentation is an essential issue in image description and classification. Currently, in many real applications, segmentation is still mainly manual or strongly supervised by a human expert, which makes it irreproducible and deteriorating. Moreover, there are many uncertainties and vagueness in images, which crisp clustering and even Type-1 fuzzy clustering could not handle. Hence, Type-...

متن کامل

Voting for two speaker

The process of locating the end points of each speakers voice in an audio file and then clustering segments based in speaker identity is called speaker segmentation. In this paper we present a method for two speaker segmentation, though it can be extended to more than two speakers. Most methods for speaker segmentation and clustering start with an initial computationally inexpensive speaker seg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003